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Critical to precise quantitative research Is reliability 
estimation. Researchers have limited tools, hofwever, to assess 
the reliability of evolving instruments. Consequently, cursory 
assessment is typical and in-depth evaluation is rare. This 
paper presents a rationale for and description of PIAS, a 
computerized instrument analysis system. PIAS makes two major 
contributions to measurement theory and practice: (1) PIAS is 
a cpXlectix)n„of^most-'af-the-rout4nes -necessary f or' such a^^^ 

(2) PIAS provides unique output 
allowing the user to identify the most efficient combination 
of items; i.e., the smallest number of items with the highest 
reliability. 
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The Development of a Computerized System fov the Estimation of 
Reliability for Measurement Systems Employing Interval or 

Ratio Data by D. Thomas Porter 



A cardinal article of faith in measurement theory and practice 
is reliability estimation. The theoretical and practical value 
of any research finding Is linked inextricably to the internal 
and external validity of the methodology employed to extract that 
finding. Validity must in turn be supported by, among other 
things, a reliable measurement system. Thus \Aen a researcher 
fails to support the reliability of his measurements, the validity 
of his conclusions in questionable and their value nondiscemable. 
In short, the determination of reliability is a fundamental re- 
quisite for producing research which has any practical and/or 
theoretical value. A researcher is left at a disadvantage, 
however, as there are few, if any, complete, user- oriented, and 
efficient computer~packages-for— instrument-aiialysis. Consequently, 
researchers are prone to conduct cursory evaluations of their 
instruments. 

Personal experience of the author suggests that cursory in- 
strument evaluations are often common because extant programs are 
accessed independently, and if all these programs are employed, 
their combined output is typically deficient. To conduct a 
^'c6nit)l^ti6^'^valuatlon, a researcher must access six or seven 
different computer packages and programs. Unfortunately, the 
path of least resistance is usually taken. An average inter-item 
correlation coefficient is calculated, plugged into Nunnally s 
formula, and that is it. Such practice is hardly sound measure- 
ment technique. 

The fundamental premise of PIAS is that the simple computation 
of a reliability coefficient is an insufficient estimation of an 
instrument's reliability, nor in any sense is such a complete in- 
strument analysis. Actually several peripheral Indices are used to 
illustrate and/or support the reliability of a measurement system. 
Correlations with the total score, item discrimination indices, 
beau coup factor analyses, and split-half reliability checks are 
coiMnon practice. When all of these data are gathered, however, 
two problems remain. First, several "canned" programs must be 
accessed, and if thoy fail to give sufficient information, then 
additional/ time-consuming, and original programs- must be written, 
documented, and validated. Even then, the integration of these 
outputs is often tedious and ineffic5ient. Second, this Informa- 
tion does not answer the following questions: 

a) could higher reliability be achieved with a fewer, 
select combination of tiems? 

b) Could the same reliability be achieved with a fewer, 
select combination of items? 
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Central to measurement systems development is measurement 
efficiency. This goal is particularly important in research where 
respondent time is at a premium. \^hen new instruments are being 
tested for the first time, measurement efficiency is even more 
Important. If the same (or higher) reliability end validity can 
be achieved with 20 Items as with 40, then significant research 
resources can be saved, Mormally, a decrease in the number of 
items causes a decrease in reliability. Such is not always the 
case, however, as deletions of certain Items may, in fact, in- 
crease reliability. Unfortunately, current programs and algorithms 
do not prpyide such Information. With the ?5,oals of complete in- 
strument analysis and Instrument efficlen^ ^ in mind, PIAS was 
written, documented, and validated. 



Reliability * 

Metaphorically, reliability is a measure of the extent to 
which a set of scores are consistent over time and consistent 
internally. Theoretically, reliability is the ratio between 
true scores and error. Accordingly, reliability Is a function of 
two factors: internal consistency and stability. Internal con- 
sistency is the extent to which components are consistent with each 
other.' For example, each of the questions' responses on an atti- 
tude -questionnaire should correlate moderately with each other if 
the questionnaire is consistent internally. Internal consistency 
is examined typically by ••split-half" correlations, average inter- 
Item correlations, correlations with the total score, or the 
ability of an individual Item to discriminate significantly 
between a group of high and low scorers (as defined by. the total 
score) . 

Stability is the extent to which the scores on a measurement 
system can be produced again at another administration of the 
system. Stability (commonly called ••test-retest^' reliability) is 
usually operational ized by administering the measure at one time 
and correlating the responses with responses at another time. The 
adeqtiacy of this procedure is dependent upon several factors. One, 
the Interval (s) of time between administrations is (are) critical. 
Small intervals allow respondents to remember their former responses 
and thus artiflcally inflate the correlation (s) between administra- 
tions. Two, stability estimates assume the construct being measxnred 
Is o trait construct: i.e., corresponding constructs and measures 
are stable over tijne. State constructs, on the other hand, and their 
measures are designed to reflect sensitive environmental- changes on 
purpose. Unless the researcher can control exp:ilcltly the environ- 
^ ment of the admlnis trat ions , >^the^ s tab illty^of ^state^c onstruc t s^and 
measures^ may be Tiff^^^^^ 'Three7^ t^^ of a study 

may place more emphasis upon the stability of a xneasurement system. 
Any research which anploys a measure of change (e.g. , from a pre 
to a posttest) assumes that differences occurring over time are 
not a function of the Instability of the measurement system and 
are a function of the independent variable. 
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Stability is used in this context instead of "teat-retest" 
reliability for a very Important reason, "Test-retest" procedures 
may give the researcher a false sense of confidence about a 
measure's stability. If the measure and its construct are to b^ 
generalized to more than two points in time, then stability assess- 
ment should comprise . more than the typical two administrati on 
paradigm. When the measure is used as a predictor of other measures 
or constructs for a span of several years (e.g., student placement 
In college), then multiple administrations are absolutely critical. 

Overall then, reliability is a function of a measurement 
system's internal consistency and its stability. Nunnally (1967, 
page 193) has operationalized this relationship mathematically. He 
cbnciudesT^^ overemphasized in its 

importance for measurement theory." 



Reliability « Kr 

1 + (K - l)r 

In the above theorem/ formula, Nunnally has incorporated both 
components of reliability. The K represents the number of items 
in the measurement system; whereas, r the average inter- item cor- 
relation coefficient between components. Stability is reflected 
in K and internal consistency is reflected in r. The larger the K 
value; the smaller the extent any one item's aberrations can change 
the overall scores at a later administration and thus, the higher 
.the stability of the Instrument. The higher the value of r, the 
greater the degree of internal consistency between items. This 
theorem assumes that the measurement system is designed to measure 
one construct, tf it measures more than one construct, then the 
reliability will be lower as the degree of independence between 
constructs grows. Accordingly, sub-constructs or sub- tests are 
assessed individually as to their reliability. 

The reliability coefficient which results will range in value 
from 0.0 to 1.0 with 0.0 indicating no reliability and 1.0 in- 
dicating perfect reliability. The size of the number of observa- 
tions taken to estimate reliability does not directly affect .the 
size of the reliability coefficient. If the number of observations 
is small or selected non-randomiy, however, the variability of the 
items' scores may be small and deflate the coefficient accordingly. 
In addition, the degree^of confidence that the obtained coefficient 
represents the true reliability is directly proportional to the 
number. of observations collected; the larger the the better. 

The "adequacy" of a reliabill^ync~6elE'fi^l'^^ 
matter for researchers; for the most part, it is largely sub- 
jective. In some cases, the "adequacy" issue is purely academic; 
a given reliability may be all that is feasible. Whenever possible, 
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however, the researcher should try to Increase reliability. There 
are some objective criteria which should be considered when asking 
is my reliability 'Tiigh enough?" For example, a coefficient of 
.70 indicates that the instrument accounts for about 49% (the co- 
efficient squared times 100) of the variance within the measure- 
ment system; or, in other words, over half of the; internal 
variance is extraneous. In this instance a researcher would ^ 
normally want to Improve the instrument 's re liability. One should 
also realize that low reliability tends to lower^the^ probability- 
that a null hypothesis will be rejected. In other words, low 
reliabilities have a tendency to conservatize hypothesis testing 
and reduce the efficiency of the ratio between research resources 
expended and meaningful results obtained. When prediction of 
other variables is the purpose of a measurement system, then low 
reliabilities are even more critical. In this last case in- 
accurate and imprecise predictions will be more probable. 

Of course, reliability is only one part of the total re- 
search process. Once sufficient reliability estimates are ob- 
tained, the more important que cit ions of validity enter. But 
without sufficient reliability, validity is, by definition, a 
moot question. In short reliability and instrument analysis is 
only a first, but necessary step in scientific research. 

Program PIAS 

PIAS is a multi-phasic, computerized instrument analysis 
system/ It is user-oriented and requires that the user know how 
to format the data (tell PIAS where to find the data upon input). 
It provides five different analyses, each of which give informa- 
tion as to an instrument's reliability. In addition, PIAS pro- 
vides many descriptive statistics and, when appropriate, a test 
for additivity of items. A typical use would be the analysis 
of a Likert-type attitude scale x^here the user needs to know 
what items are decreasing reliability, if any, and what is the 
most efficient cluster of items; i.e., what combination of items 
gives the highest rellability,..with the smallest number of items. 
For a complete list of inputs and outputs, see TableeO^ 

Two. — - , - ..v — — — ~V~r- : ™- 

Phase I of PIAS is descriptive in nature. For each comiponent 
in the measurement system Phase I provides means, standard de- 
viations, ^standard errors of the mean, kurtosis and skewness 
values, probabilities of kurtosis and skewness values, highs, 
lows, and ranges of each item. Tf the user desires, distribution 
plots of each item wiTI^l^o'be W addition it provides 

a correlation matrix, t and £ values matrix, and a correlation 
matrix assessment. In this phase PIAS notes and stores all items 
which failed to correlate significantly with greater than 60 
per cent of the other items. (NOTE: at this point and all others 
where tests of significance are conducted, adjustments are made 
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to account for multiple tests. This is accomplished by dividing 
the user's input alpha by the number of tests to be conducted, 
Bonferonni) . 

rtiase II is concerned with additivity of items in the measure. 
Since many scales are comprised of items which are assumed to be 
eqtial in importance and are later added together to form a total 
score, this phase is very Important for instrument development. 
Phase II conducts a test of significance between all meaningful 
pairs of item means, reports the t and p values, and then a 
summary table describing which items differed significantly with 
which other items. Finally, a one-way ANOVA is conducted across 
the total group of items to give an indication of additivity. 
Homogeneity of variance indices are provided with this test. 

Phase III of PIAS is concerned with what the most efficient 
combination of items is with respect to reliability. EacJh item 
is rank-ordered by its degree of contribution to the reliability 
coefficient. This calculation is accomplished by a test of 
significance between the average correlation of one item with 
all other items and the average inter- correlation of all items. 
Any item whose correlations with other items is significantly 
lower than the average inter- item correlation coefficient is 

so noted with an aster ik. Phase III ends with a desc ription 

of the most efficient combination of items. 

Vhaae IV is concerned primarily with internal consistency. 
Two forms of output are provided.. The first output is correla- 
tions with the total score. This analysis is based upon the 
assumption that the best estimate of the true score for a given 
case is the total score of all items. Any item which fails to 
correlate with the total score should be viewed as a questionable 
item (assuming a uni-dimensional measure). The second output 
is an indication of each item's ability to discriminate between 
high scorers and low scorers. Any item which could not do so, 
would also be questionable from an internal consistency point 
of view. 

Ptiasie V serves^ sl5m At this pofiSt PIAS 

calculates what the reliability would be if items failing to 
meet the criteria specified in Phases II-IV were deleted. Phase 
V also gives the overall reliability and any additional des- 
criptive statistics on total scores, if appropriate. 

PIAS is witten in FORTRAN for the Cyber 173 NOS 1.1 
operating system at the State University of New York at Bu^^ 
(RUNW compiler). It employs 19 subroutines and requires a field 
length of 44300 octal plus sufficient field length (core memory) 
for the FORTRAN compiler. Although developed at the CDC Cyber 
173 which has a 60 bit word capacity, PIAS was written to 
accomodate machines with 32 bit or larger word capacities. 
The source deck is approximately 2000 cards in length. 

■■r/-^-; ■■. ■ - ■ ' \ 7;,..' : , , 
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Because reliability estimation i^ a paramount concern for 
researchers interested in conducting quality research and because 
current programs are incomplete and disorganized, PIAS was de- 
veloped. PIAS allows the user notcmly to estimate reliability 
and completely assess his instrument, but also to identify the 
precise combination of items which give the most efficient re- 
liability. Researchers, like most human beings, often follow 
paths of least resistance. If editors do not insist unon ra- 
liability -information, researchers will rarely provide it. 
Correspondingly, if improving reliability and conducting in- 
strument analyses means several trips to the computer center 
because of several disjointed programs and analyses, then 
cursory instrument evaluations will continue to be the norm. 
The author sincerely hopes users find PIAS useful. 



Table One 
Inputs for Program PIAS 



1) Six character job name for analysis, 

2) The number of cases (subjects, n); needed only If input 
is from cards, n must be greater than 1 and less than 
3001. 

3) The number of items (components, i) in the measurement 
system, i must be greater than 1 and less than 101. 

4) The alpha the user wants to maintain in the analyses. 
Alpha must be greater than zero and less than .26. 

5) Whether punched card output is desired. If so, whether 
sums of standardized or non-- standardized Items are desired. 

6) Whether a listing of input, data is desired. 

7) Whether a heading for the analyses is to be read in and 
printed. 

8) Whether non- standardized items can be logically summed or 
whether items should be standardized (converted to Z-*scores) 
before summing across Items. (If neither, some of the 
output listed in Table Two will not be produced.) 

9) Whether cases in high and low categories of scorers are 
to be printed. 

10) I^hether distributions of all items are to be plotted. 

11) The source of input (cards, file, or magnetic tape). 

12) Effect size for difference testing, 

13) Effect size for correlational testing. 

14) Discrimination analysis classification factor. 

15) A FORTRAN- type format for the data input source. 

16) A heading card (optional). 



Table Two 
Output Products of Program PIAS 



Descriptive Data for Each Item of the Instrument: 

AX) Means and standard deviations, 
A2) Sums and standard errors of the mean. 
A3) Skewness and kurtosis values, 
A4) Probabilities of skexmess and kurtosis values. 
A5 ) —High r lew, "range characteristics . 
A6) Means + standard deviations. 

A7) Correlation matrix; t-values and £ values associated 

with same; alpha is adjusted for multiple tests, 
A8) Distribution plots for each item (optional). 

Reliability Information for the Instrument: 

Rank order of each item's contribution to the overall 
reliability coefficient; most efficient group of items. 
Correlation matrix assessment which identifies all items 
not correlating with any one particular item. 
Assessment of Items which differ significantly with each 

other (optional). ^ 

Analysis of variance for item additivlty (optional). 
Correlations of each item with the total score (optional). 
Discrimination analyses for each item (I.e., items* 
ability to discriminate significantly between groups 
of high and low scorers, optional). 

Peripheral Output: 

CX) Parameter check to ensure all input parameters *re legal. 
C2) Heading for the analysis (optional). 
C3) Listing of input data (optional). 

C4) Punched card output of standardized or non-standardized 

items' sums, identification data, and case numbers (opt.) . 
C5) Listing of cases, their summed scores, and identification 

data in the high and low caregorie^ 
C6) Reliability coefficients with increased and decreased 

numbers of items/components , 
C7) Simmiary of analysis : 

C7a) Reliability .of measurement system, average inter- 
item correlation coefficient among items. 
C7b) Reliabilities if items designated by criteria above 

(Bl to b6) were deleted. 
C7c) Mean, standard deviation, standard error, kurtosis, 
skewness, percentiles, quartlles, and distribution 
plot of the total scores (conditional). 



BX) 
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B4) 
B5) 
B6> 
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UTILIZATION AGREEMENT AND ORDER PORt'I FOR PROGRAM PIAS 



Program PIA5 Is Intended for use only by non-profit and non-military 
Institutions and individuals. Any deviation from this principle without the 
expressed written consent of the author is strictly prohibited by the copyright 
for PIAS and written agreement herein. If use of same constitutes a non-profit 
use, permission is necessary for legal use of PlAS. For further information 
write or call: 

Dr. D. Thomas Porter 

4226 Ridge Lea Road 
State University of New York, Buffalo 
Buffalo, New York 14226 
716-838-3208 or 716-831-1607 

Please use the form below for ordering manuals, source decks, or 
requesting services. Please print or type. 



Name 



Institution^ 
Address 



Zip^ 



— - X agree th^ t the use of program-PlAS-will--be or- non-prof it^and--non-~ 
military resear;iT and/or instructional purposes only. 

Signed^ . - 

Check desired items: 

QUAMTITY PRICE TOTAL 
Manual (s) $4.00@ ^ 



Source Deck 1 $60.00 029 or 026 punch 

(circle one) 

Installation* na $275.00 



Grand Total $^ 
Check or money order enclosed (no cash) 
Please bill me. 



*Price of installation covers cost of source deck and Ji manuals, but it does 
not cover travel and per diem expenses which are to be borne by the purchasing 
institution or individual. 
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